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Among the several ways followed for detecting Parkinson's disease, there is 
the one based on the speech signal, which is a symptom of this disease. In this 
paper focusing on the signal analysis, a data of voice records has been used. 
In these records, the patients were asked to utter vowels “a”, “o”, and “u”. 
Discrete wavelet transforms (DWT) applied to the speech signal to fetch the 
variable resolution that could hide the most important information about the 
patients. From the approximation a3 obtained by Daubechies wavelet at the 
scale 2 level 3, 21 features have been extracted: a linear predictive coding 
(LPC), energy, zero-crossing rate (ZCR), mel frequency cepstral coefficient 
(MFCC), and wavelet Shannon entropy. Then for the classification, the 
K-nearest neighbour (KNN) has been used. The KNN is a type of instance- 
based learning that can make a decision based on approximated local 
functions, besides the ensemble learning. However, through the learning 
process, the choice of the training features can have a significant impact on 
overall the process. So, here it stands out the role of the genetic algorithm 


(GA) to select the best training features that give the best accurate 
classification. 
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1. INTRODUCTION 

AS a progressive degenerative neurological disease, Parkinson's disease affects the motor system. The 
Parkinson's disease is due to the lack of dopamine, and the death of cells in substantianigra by an unknown 
cause. This cause is likely to be due to environmental and genetic factors. Hence the Parkinson's disease causes 
serious damages to the vital function of many organs. Among them the larynx which becomes affected and 
causes abnormalities in speech signal [1]-[3]. Recently so much research recoursed to speech processing in 
order to detect Parkinson's disease by employing the acoustical and decompositional features. The wavelet 
transform has been used to tackle the problem of constant resolution. In the paper, Drissi et al. [4] applied the 
different sorts of discrete wavelet transforms (DWT) on the speech signal to obtain the mel frequency cepstral 
coefficient (MFCC), then classified those coefficients by using the support vector machine (SVM). 

In research [5], [6], authors used the Daubechies level 2 in the 3rd scale that gave the best results in 
[4] to extract the MFCC with two kernels of SVM Linear and radial basis function (RBF). Accuracy has been 


Journal homepage: http://ijece.iaescore.com 


Int J Elec & Comp Eng ISSN: 2088-8708 O 1983 


obtained by the RBF kernel while in the article [7]. In 2020, Soumaya et al. [7] used the classifier K-nearest 
neighbour (KNN) instead. the predicting system has an accuracy of 98.68%. The features MFCC, linear 
predictive coding (LPC), energy, zero-crossing rate (ZCR), and Shannon entropy have been used in many 
speech signal studies, either for the detection of Parkinson's disease as in [8], [9] or either for recognization 
[10]. In the paper, Oung et al. [11] proposed a detection and a classification system of the Parkinson's disease 
centered on empirical wavelet transform (EWT) and empirical wavelet packet transform (EWPT). The aim of 
using EWT/EWPT was to decompose the signals from wearable motion and audio sensors. The three classifiers 
KNN, probabilistic neural network (PNN), and extreme learning machine (ELM) have been applied to analyze 
the performance of the algorithm. The accuracy of more than 90% has been obtained by EWT/EWPT-ELM 
based on signals from motion and audio sensors respectively. 

The genetic algorithm (GA) has the main role to overcome the optimization problems. In the paper 
[9], [10] the genetic algorithm is applied for the purpose of selecting the convenient features to reach the most 
accurate prediction system. In the paper, Umar and Felemban [12] used the GA to execute cyber attacks as 
false data injection attacks (FDIA) in the power systems. As the case with the GA and the classifier KNN that 
has been used in the identification of COVID-19 [13]-[15], there are several methods utilized in the 
classification systems based on machine learning and dimension reduction [1], [16]—[22]. The gait followed in 
this paper is as follows: section 2.1 sheds light on the problem statement and the used material. Section 2.2 
illustrates the hotlines followed in the proposed method. Section 3, provides the obtained results, and a 
discussion about them. Afterward, the last section concludes the topic of this paper. 


2. RESEARCH METHOD 
2.1. Problem statement and material 

Data used in this paper have been collected by Sakar et al. [23], with the sound card of 16 bit in a 
desktop computer. Through a microphone standard with a sampling frequency of 44,100 Hz, the patients were 
asked to utter the vowels “a” “o”, and “u”. The collected records were made in stereo-channel mode then saved 
in WAV format. The data contains 34 voice records of “a” vowel, 30 of “o”, and 29 of “u”. 

The diagnosis is the main and particular task in the trouble-shooting process. In this paper, the purpose 
is centered on the speech signal to build a classification system of Parkinson's disease. In this framework, it is 
worth mentioning two key issues; i) finding the best speech processing for extracting the features that hide the 
medical information of the patients, besides choosing the appropriate vowel for Parkinson's disease detection; 
and ii) selecting features to figure out the weighted ones by an optimization algorithm, and ensemble learning 
for classification. In the following section, the key details of our approach are explained. 


2.2. Proposed method 

The signal transformation, features extractions, features selections, and classification are the four main 
steps of the proposed method. This enhanced method seeks to detect Parkinson’s disease based on speech signal 
processing. As it is depicted in Figure | the start was with the speech signal transformation with Daubechies 
db level 2 at the 3rd scale followed by the pre-emphasis applied on the a3 approximation. Then concatenate 
the 21 acoustical and decompositional features extracted. The GA is used as an optimization algorithm to select 
the features besides the classifier KNN. 

The proposed method followed in this paper started by transforming the speech signal by using 
mathematical transformations. In this respect, although the effectiveness of the fourier transform (FT) and the 
short-time fourier transform (STFT), But the fixed resolution still have got some shortcomings. The adapted 
STFT which uses a single fixed window [5] gives a two-dimensional space of time and frequency map of the 
signal. With this resolution the best we can do is in a given time interval, target the spectral components that 
exist. Hence it can’t give precisely the component spectral that exists at a given time. So they shifted to wavelet 
transforms (WT) that cover up the deficiency of the STFT, by the multi-resolution. 

Before the formants frequencies and MFCC extraction, the transformed signal underwent a pre- 
processing. The first step of the pre-processing is the pre-emphasizing phase that is performed by a filter is as 
shown in (1). Then dividing the signal into frames that go through a hamming window that is given in (2) to 
minimize the discontinuities that occurred by the framing. 


x(n) = a3(n) — ka3(n-1) (1) 


w,(n) = 0,54 — 0,46cos (=) (2) 


N-1 


Concerning the acoustical features, the formant frequencies are extracted by the use of LPC presented 
in [9], which gives an accurate presentation of speech parameters. The square summation of each sample 
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represents the time domain energy of the signal. Whereas during a specific time period (a frame) the rate at 
which a signal changes its sign stands for the rate zero-crossings ZCR. Energy and ZCR are calculated by using 
(3) and (4), respectively. 

E = Ynqilx1(n)|? (3) 


ZCR = —[Eazi(x1(m + 1) -x1(n))] a 


With x1 the windowed frame signal, and n = (1, 2, N) is the length of the windowed frame. 
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Figure 1. The flowchart explains the proposed predicting system 


A set of 12 MFCC is extracted by relying on a cepstral analysis of the speech signal as it is depicted 
in Figure 2. First of all, the signal passed by a pre-processing as it is explained in Figure 1, then the FFT and 
the filtring by the filter bank. After that cames the cepstral analysis just before extracting the 12 MFCC 
coefficients. The reason for using a cepstral analysis in the log-spectral domain [23] is to cope with the problem 
of the convolution of the source through the vocal tract. This convolution became a product in the spectral 
domain. So we recourse to the log to separate this product by the cepstral analysis. 
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Pre-processing 


Cepstral domain 


Figure 2. The MFCC extraction process 


The decompositional features manifest in the Shannon entropy are calculated for the approximation 
a3, also for details d3, d2, and d1. The Shannon entropy is an important part of information theory, which 
describes the degree of confusion of a system. The more orderly the system is, the smaller the entropy is. 
Shannon entropy “H” is defined as (5): 


H = Xi pjlog.(p;) (5) 


For the features extraction phase, the last step is concatenating the acoustical features vector of 1x17 
size, and the decompositional features vector of 1x4 size. Construct a vector of 1x21 for every individual. 
Afterward, the selection of the features based on the KNN classifier embedded within the GA optimization. 
The GA is inspired by the nature of the genetic mechanism and biological evolution theory. The main objective 
of this algorithm is to find the optimal solution by simulating the process of natural evolution. It starts with 
generating randomly an initial population where each individual is supposed to be a solution, then we calculate 
the fitness function of how accurate the system is. Furthermore evolving generation over generation using the 
genetic strategies: crossover, mutation, and selection till the predetermined number of iterations. 

Furthermore, in this study, the feature vector 1x21 extracted from each patient is represented as a 
chromosome, and each feature of them is as a gene. Any gene of them takes a label of 1 or 0. Hence a new 
population is generated by focusing on the initial chromosome with the 1s while weeding out the 0 ones. The 
generated matrix is the input of the classification bloc. Therefore either the KNN classifier with the 10 fold 
cross-validation method has been used, or the ensemble learning by the KNN. If the system is more accurate 
that means the individual is the best solution. The KNN comes under the category of Lazy Learner 
approaches[15]. The KNN algorithm applied Euclidean distance to spot the K nearest neighbor of the testing 
samples [7]. 


du) = Vitivyyp — uj (6) 


The Euclidean distance d(v, u) between two points v and u are calculated using (9). Where M is the 
number of characteristics so that v= {v1, V2, v3... vm} and y = {Uj, U2, U3 ... Um}. Intending to boost the 
performance of the KNN classifier, we use ensemble learning methods. This technique's basic concept is 
training multiple base learners like ensemble members. Their predictions combine into a single output that 
must have the best performance on average more than any other ensemble member with uncorrelated error on 
the target data sets. In this study, two classification systems are followed the first is the KNN with K=1, and 
the second is the KNN trained by the creation of a model by the random subspace method RSM [24]-[26]. 

Then by taking into account the fitness calculated by the classification phase the parents have been 
chosen among the current population by using the roulette wheel to create offspring. It is noteworthy to mention 
that the higher fitness chromosomes (individual) are usually more selected. The crossover and mutation 
intervened to produce a new offspring. The new offspring comes through the same process which is to calculate 
the accuracy of the probable solution and create other populations until reaching the iterations number 
predefined. 


3. RESULTS AND DISCUSSION 

Within the aforementioned framework, this paper sheds light on the selection of features in a 
predicting system of Parkinson's disease patients and aims at defining the suitable vowel to detect the PD. The 
experimental results are calculated by the predicting system executed by MATLAB 2018a installed in a laptop 
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with Intel_CoreTM i5-6300U CPU (2.4 GHz, 4 CPUs) and 8 Go RAM, SSD. 

Based on the hybrid method of time-frequency properties and SVM proposed in the [7] the speech 
signal has been transformed by the Daubechies db2 scale 3rd, level 2. The next step was the extraction of 
acoustical and decompositional features, after concatenating them the data matrix used in this paper is obtained. 
For selecting the best possible solution thus the genetic algorithm intervenes. In this study, we work with the 
records of each vowel alone “a” “o” “u”, the two by two, and then the overall as it is shown in Table 1. 

For each combination, 50 attempts have been carried out, the algorithm has been run in each attempt. 
Starting from 1 iteration to 10 five attempts have been executed for each iteration. First, the data has been 
initialized randomly, afterward the offspring generated by the genetic algorithm used in the coming iterations. 
The classification step in order to calculate the fitness function has used classifier KNN, and the ensemble 
learning with KNN to differentiate between the healthy and the sick patients. Table 2 shows the results obtained 
by applying the method suggested in this paper. 


Table 1. The number of observations in each combination 
Vowels Jal lof fal /a,o/ /a,u/ /o,u/ /a,o,u/ 
Observations 34 30 30 64 64 60 94 


Table 2. Results obtained by using KNN and ensemble learning 
Vowels KNN Ensemble learning 


«a» 91.18% 91.18% 
«o» 86.67% 86.67% 
«u» 86.21% 82.76% 
«a, 0» 85.94% 87.5% 
«a,u» 85.71% 87.3% 
«o, u» 88.14% 86.44% 
«a,o,u» 83.87% 84.95% 


As presented in the Table 2, while using each vowel alone “a”, “o”, and “u” we obtain respectively 
the accuracy percentages 91.18%, 86.67%, and 86.21% with the method of KNN classifier. the same results 
are fulfilled when we use the ensemble learning of “a” and “o”, whereas the accuracy of the vowel “u” 
decreased. Concerning the combination of two vowels, the “a, u” and “a, o” gives almost the same accuracy 
with both of classification method. The accuracies were 87.3% and 87.5% with the ensemble learning technique 
and 85.71% and 85.94% with the KNN classifier. During the combination of “o, u” we reach an accuracy of 
88.14% by the KNN classifier. The last combination that used the three vowels “a, o, u” the best accuracy that 
has been achieved was 84.95% by the ensemble learning technique. These results are depicted in the confusion 
matrix in the Figure 3. 


90% 
80% 
70% 
50% 


20% 


Figure 3. Confusion matrix 


The previous percentages are achieved by the reduced features vector contain only 15 coefficients. 
the GA allowed to weed out the coefficient with the low fitness function and keep the others with the highest 
fitness. In our study, the phase of features selection by utilizing the GA reduced the features vector from 21 
coefficient to 15 only. The reduced vector contains the F3 frequency formant, ZCR, 10 coefficients of MFCCs 
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which are (C1, C2, C5, C6, C7, C8, C9, C10, C11, C12), the last 3 Shannon entropy concerning the details dj, 
do, and d3. 

Taking into account the time of running the proposed algorithm, the KNN classifier was faster than 
the ensemble learning technique. So due to the obtained results, we can notice that the best results have been 
fulfilled by the vowel “a” with both of the classifications methods. Even for the other combination, the KNN 
was the best by giving 88.14% which is the second-best accuracy. Besides, the use of the genetic algorithm 
with wavelets increase the accuracy of the predicting system 

In the paper, Benba et al. [27], they proposed a classification of Parkinson's disease by using 20 
MFCCs with an SVM classifier and working with the same data used in this paper. The classification system 
gives an accuracy of 82.50% with the MLP kernel of SVM for the vowel “u” and the 77.50 % using the three 
vowels at once. In the present work, the proposed method intervenes in two phases. First at the signal 
processing by using the DWT to transform the speech signal. Then at the features selection phase, we recourse 
to the genetic algorithm which minimizes features and maximizes the accuracy. The best result was obtained 
with the vowel “a” 91.18% which is higher by 10.52% than the vowel “u” in the article [27] and 84.95% with 
the combination of the three vowels which is higher by 9.61%. Than while minimizing the length of features 
vector from 20 coefficient to 15 coefficients. 


4. CONCLUSION 

The present study has been conducted to create a prediction system of Parkinson's disease based on 
speech signals (vowels). So, this paper we propose two methods. The first was the GA for feature selection 
embedded with the KNN classifier and the second was GA-KNN with ensemble learning. The speech signal 
used to validate the proposed methods is records of Parkinson's disease patients and healthy ones that utter the 
vowels “a”, “o”, and “u”. 

First of all, the signal was transformed by the use of DWT Daubechies. After those 21 coefficients are 
extracted from the a3 approximation of each record and concatenated in the input matrix of the predicting 
system. In the optimization phase, the GA has been used to reduce the dimension of the features matrix. The 
KNN and ensemble learning have been employed to calculate the accuracy of each population generated by 
GA that stands for the fitness function. By the use of the classifier KNN, we obtain an accuracy of 91.18% as 
far as the vowel ‘a’ is concerned. So we conclude that the most appropriate vowel for the Parkinson's disease 
detection is the “a” and the KNN classifier that minimizes the programming and the time execution 
considerably. 
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